Morphological Analyzer for Affix Stacking Languages: A Case Study of Marathi

نویسندگان

  • Raj Dabre
  • Archana Amberkar
  • Pushpak Bhattacharyya
چکیده

In this paper we describe and evaluate a Finite State Machine (FSM) based Morphological Analyzer (MA) for Marathi, a highly inflectional language with agglutinative suffixes. Marathi belongs to the Indo-European family and is considerably influenced by Dravidian languages. Adroit handling of participial constructions and other derived forms (Krudantas and Taddhitas) in addition to inflected forms is crucial to NLP and MT of Marathi. We first describe Marathi morphological phenomena, detailing the complexities of inflectional and derivational morphology, and then go into the construction and working of the MA. The MA produces the root word and the features. A thorough evaluation against gold standard data establishes the efficacy of this MA. To the best of our knowledge, this work is the first of its kind on a systematic and exhaustive study of the Morphotactics of a suffix-stacking language, leading to high quality morph analyzer. The system forms part of a Marathi-Hindi transfer based machine translation system. The methodology delineated in the paper can be replicated for other languages showing similar suffix stacking behaviour as Marathi.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Paradigm-Based Finite State Morphological Analyzer for Marathi

A morphological analyzer forms the foundation for many NLP applications of Indian Languages. In this paper, we propose and evaluate the morphological analyzer for Marathi, an inflectional language. The morphological analyzer exploits the efficiency and flexibility offered by finite state machines in modeling the morphotactics while using the well devised system of paradigms to handle the stem a...

متن کامل

Conversion of Procedural Morphologies to Finite-State Morphologies: A Case Study of Arabic

In this paper we describe a conversion of the Buckwalter Morphological Analyzer for Arabic, originally written as a Perl-script, into a pure finite-state morphological analyzer. Representing a morphological analyzer as a finite-state transducer (FST) confers many advantages over running a procedural affix-matching algorithm. Apart from application speed, an FST representation immediately offers...

متن کامل

An Unsupervised Morpheme-Based HMM for Hebrew Morphological Disambiguation

Morphological disambiguation is the process of assigning one set of morphological features to each individual word in a text. When the word is ambiguous (there are several possible analyses for the word), a disambiguation procedure based on the word context must be applied. This paper deals with morphological disambiguation of the Hebrew language, which combines morphemes into a word in both ag...

متن کامل

Frequent Case Generation in Ad Hoc Retrieval of Three Indian Languages - Bengali, Gujarati and Marathi

This paper presents results of a generative method for the management of morphological variation of query keywords in Bengali, Gujarati and Marathi. The method is called Frequent Case Generation (FCG). It is based on the skewed distributions of word forms in natural languages and is suitable for languages that have either fair amount of morphological variation or are morphologically very rich. ...

متن کامل

An Affix Stripping Morphological Analyzer for Turkish

This paper presents the design and the implementation of a morphological analyzer for Turkish. A new methodology is proposed for doing the analysis of Turkish words with an affix stripping approach and without using any lexicon. The rule-based and agglutinative structure of the language allows Turkish to be modeled with finite state machines (FSMs). In contrast to the previous works, in this st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012